MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads
نویسندگان
چکیده
MapReduce has become a widely used computing model for largescale data processing in clusters and data centers. A MapReduce workload generally contains multiple jobs. Due to the general execution constraints that map tasks are executed before reduce tasks, different job execution orders in a MapReduce workload can have significantly different performance and system utilization. This paper proposes a prototype system called MROrder to dynamically optimize the job order for online MapReduce workloads. Moreover, MROrder is designed to be flexible for different optimization metrics, e.g., makespan and total completion time. The experimental results show that MROrder is able to improve the system performance by up to 31% for makespan and 176% for total completion time.
منابع مشابه
MRTuner: A Toolkit to Enable Holistic Optimization for MapReduce Jobs
MapReduce based data-intensive computing solutions are increasingly deployed as production systems. Unlike Internet companies who invent and adopt the technology from the very beginning, traditional enterprises demand easy-to-use software due to the limited capabilities of administrators. Automatic job optimization software for MapReduce is a promising technique to satisfy such requirements. In...
متن کاملFLEX: A Slot Allocation Scheduling Optimizer for MapReduce Workloads
Originally, MapReduce implementations such as Hadoop employed First In First Out (fifo) scheduling, but such simple schemes cause job starvation. The Hadoop Fair Scheduler (hfs) is a slot-based MapReduce scheme designed to ensure a degree of fairness among the jobs, by guaranteeing each job at least some minimum number of allocated slots. Our prime contribution in this paper is a different, fle...
متن کاملShared Execution of Recurring Workloads in MapReduce
With the increasing complexity of data-intensive MapReduce workloads, Hadoop must often accommodate hundreds or even thousands of recurring analytics queries that periodically execute over frequently updated datasets, e.g., latest stock transactions, new log files, or recent news feeds. For many applications, such recurring queries come with user-specified service-level agreements (SLAs), commo...
متن کاملOctopusDB : flexible and scalable storage management for arbitrary database engines
We live in a dynamic age with the economy, the technology, and the people around us changing faster than ever before. Consequently, the data management needs in our modern world are much different than those envisioned by the early database inventors in the 70s. Today, enterprises face the challenge of managing ever-growing dataset sizes with dynamically changing query workloads. As a result, m...
متن کاملTowards Understanding Cloud Performance Tradeoffs Using Statistical Workload Analysis and Replay
Cloud computing has given rise to a variety of distributed applications that rely on the ability to harness commodity resources for large scale computations. The inherent performance variability in these applications’ workload coupled with the system’s heterogeneity render ineffective heuristics-based design decisions such as system configuration, application partitioning and placement, and job...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013